20 research outputs found

    HCmodelSets: An R package for specifying sets of well-fitting models in high dimensions

    Get PDF
    In the context of regression with a large number of explanatory variables, Cox and Battey(2017) emphasize that if there are alternative reasonable explanations of the data that are statisticallyindistinguishable, one should aim to specify as many of these explanations as is feasible. The standardpractice, by contrast, is to report a single model effective for prediction. The present paper illustratesthe R implementation of the new ideas in the packageHCmodelSets, using simple reproducibleexamples and real data. Results of some simulation experiments are also reported

    Gaussian process nowcasting: application to COVID-19 mortality reporting

    Get PDF
    Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing, with prominent examples in a wide range of fields. An important example of this problem is the nowcasting of COVID-19 mortality: given a stream of reported counts of daily deaths, can we correct for the delays in reporting to paint an accurate picture of the present, with uncertainty? Without this correction, raw data will often mislead by suggesting an improving situation. We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface. This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths. We test assumptions in model specification such as the choice of kernel or hyper priors, and evaluate model performance on a challenging real dataset from Brazil. Our experiments show that Gaussian process nowcasting performs favourably against both comparable methods, and against a small sample of expert human predictions. Our approach has substantial practical utility in disease modelling -- by applying our approach to COVID-19 mortality data from Brazil, where reporting delays are large, we can make informative predictions on important epidemiological quantities such as the current effective reproduction number

    Inference of COVID-19 epidemiological distributions from Brazilian hospital data

    Get PDF
    Knowing COVID-19 epidemiological distributions, such as the time from patient admission to death, is directly relevant to effective primary and secondary care planning, and moreover, the mathematical modelling of the pandemic generally. We determine epidemiological distributions for patients hospitalised with COVID-19 using a large dataset (N=21,000−157,000N=21{,}000-157{,}000) from the Brazilian Sistema de Informa\c{c}\~ao de Vigil\^ancia Epidemiol\'ogica da Gripe database. A joint Bayesian subnational model with partial pooling is used to simultaneously describe the 26 states and one federal district of Brazil, and shows significant variation in the mean of the symptom-onset-to-death time, with ranges between 11.2-17.8 days across the different states, and a mean of 15.2 days for Brazil. We find strong evidence in favour of specific probability density function choices: for example, the gamma distribution gives the best fit for onset-to-death and the generalised log-normal for onset-to-hospital-admission. Our results show that epidemiological distributions have considerable geographical variation, and provide the first estimates of these distributions in a low and middle-income setting. At the subnational level, variation in COVID-19 outcome timings are found to be correlated with poverty, deprivation and segregation levels, and weaker correlation is observed for mean age, wealth and urbanicity

    Estimating the COVID-19 infection fatality ratio accounting for seroreversion using statistical modelling.

    Get PDF
    Background: The infection fatality ratio (IFR) is a key statistic for estimating the burden of coronavirus disease 2019 (COVID-19) and has been continuously debated throughout the COVID-19 pandemic. The age-specific IFR can be quantified using antibody surveys to estimate total infections, but requires consideration of delay-distributions from time from infection to seroconversion, time to death, and time to seroreversion (i.e. antibody waning) alongside serologic test sensitivity and specificity. Previous IFR estimates have not fully propagated uncertainty or accounted for these potential biases, particularly seroreversion. Methods: We built a Bayesian statistical model that incorporates these factors and applied this model to simulated data and 10 serologic studies from different countries. Results: We demonstrate that seroreversion becomes a crucial factor as time accrues but is less important during first-wave, short-term dynamics. We additionally show that disaggregating surveys by regions with higher versus lower disease burden can inform serologic test specificity estimates. The overall IFR in each setting was estimated at 0.49-2.53%. Conclusion: We developed a robust statistical framework to account for full uncertainties in the parameters determining IFR. We provide code for others to apply these methods to further datasets and future epidemics

    Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil

    Get PDF
    Cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in Manaus, Brazil, resurged in late 2020 despite previously high levels of infection. Genome sequencing of viruses sampled in Manaus between November 2020 and January 2021 revealed the emergence and circulation of a novel SARS-CoV-2 variant of concern. Lineage P.1 acquired 17 mutations, including a trio in the spike protein (K417T, E484K, and N501Y) associated with increased binding to the human ACE2 (angiotensin-converting enzyme 2) receptor. Molecular clock analysis shows that P.1 emergence occurred around mid-November 2020 and was preceded by a period of faster molecular evolution. Using a two-category dynamical model that integrates genomic and mortality data, we estimate that P.1 may be 1.7- to 2.4-fold more transmissible and that previous (non-P.1) infection provides 54 to 79% of the protection against infection with P.1 that it provides against non-P.1 lineages. Enhanced global genomic surveillance of variants of concern, which may exhibit increased transmissibility and/or immune evasion, is critical to accelerate pandemic responsiveness

    Estimation, forecasting and anomaly detection for nonstationary streams using adaptive estimation

    Get PDF
    Streaming data provides substantial challenges for data analysis. From a computational standpoint, these challenges arise from constraints related to computer memory and processing speed. Statistically, the challenges relate to constructing procedures that can handle the so-called concept drift--the tendency of future data to have different underlying properties to current and historic data. The issue of handling structure, such as trend and periodicity, remains a difficult problem for streaming estimation. We propose the real-time adaptive component (RAC), a penalized-regression modeling framework that satisfies the computational constraints of streaming data, and provides the capability for dealing with concept drift. At the core of the estimation process are techniques from adaptive filtering. The RAC procedure adopts a specified basis to handle local structure, along with a least absolute shrinkage operator-like penalty procedure to handle over fitting. We enhance the RAC estimation procedure with a streaming anomaly detection capability. The experiments with simulated data suggest the procedure can be considered as a competitive tool for a variety of scenarios, and an illustration with real cyber-security data further demonstrates the promise of the method

    Streaming changepoint detection for transition matrices

    No full text
    Sequentially detecting multiple changepoints in a data stream is a challenging task. Difficulties relate to both computational and statistical aspects, and in the latter, specifying control parameters is a particular problem. Choosing control parameters typically relies on unrealistic assumptions, such as the distributions generating the data, and their parameters, being known. This is implausible in the streaming paradigm, where several changepoints will exist. Further, current literature is mostly concerned with streams of continuous-valued observations, and focuses on detecting a single changepoint. There is a dearth of literature dedicated to detecting multiple changepoints in transition matrices, which arise from a sequence of discrete states. This paper makes the following contributions: a complete framework is developed for adaptively and sequentially estimating a Markov transition matrix in the streaming data setting. A change detection method is then developed, using a novel moment matching technique, which can effectively monitor for multiple changepoints in a transition matrix. This adaptive detection and estimation procedure for transition matrices, referred to as ADEPT-M, is compared to several change detectors on synthetic data streams, and is implemented on two real-world data streams – one consisting of over nine million HTTP web requests, and the other being a well-studied electricity market data set

    Inference of COVID-19 epidemiological distributions from Brazilian hospital data

    No full text
    Knowing COVID-19 epidemiological distributions, such as the time from patient admission to death, is directly relevant to effective primary and secondary care planning, and moreover, the mathematical modelling of the pandemic generally. We determine epidemiological distributions for patients hospitalized with COVID-19 using a large dataset (N = 21 000 − 157 000) from the Brazilian Sistema de Informação de Vigilância Epidemiológica da Gripe database. A joint Bayesian subnational model with partial pooling is used to simultaneously describe the 26 states and one federal district of Brazil, and shows significant variation in the mean of the symptomonset- to-death time, with ranges between 11.2 and 17.8 days across the different states, and a mean of 15.2 days for Brazil. We find strong evidence in favour of specific probability density function choices: for example, the gamma distribution gives the best fit for onset-to-death and the generalized lognormal for onset-to-hospital-admission. Our results show that epidemiological distributions have considerable geographical variation, and provide the first estimates of these distributions in a low and middle-income setting. At the subnational level, variation in COVID-19 outcome timings are found to be correlated with poverty, deprivation and segregation levels, and weaker correlation is observed for mean age, wealth and urbanicity
    corecore